Goto

Collaborating Authors

 dynamic convolution




DESign: Dynamic Context-Aware Convolution and Efficient Subnet Regularization for Continuous Sign Language Recognition

Liu, Sheng, Yu, Yiheng, Feng, Yuan, Xu, Min, Jin, Zhelun, Jiang, Yining, Yuan, Tiantian

arXiv.org Artificial Intelligence

Although dynamic convolutions are ideal for this task, they mainly focus on spatial modeling and fail to capture the temporal dynamics and contextual dependencies. T o address this, we propose DESign, a novel framework that incorporates Dynamic Context-A ware Convolution (DCAC) and Subnet Regularization Connectionist T emporal Classification (SR-CTC). DCAC dynamically captures the inter-frame motion cues that constitute signs and uniquely adapts convolutional weights in a fine-grained manner based on contextual information, enabling the model to better generalize across diverse signing behaviors and boost recognition accuracy. Furthermore, we observe that existing methods still rely on only a limited number of frames for parameter updates during training, indicating that CTC learning overfits to a dominant path. T o address this, SR-CTC regularizes training by applying supervision to subnetworks, encouraging the model to explore diverse CTC alignment paths and effectively preventing overfitting. A classifier-sharing strategy in SR-CTC further strengthens multi-scale consistency. Notably, SR-CTC introduces no inference overhead and can be seamlessly integrated into existing CSLR models to boost performance. Extensive ablations and visualizations further validate the effectiveness of the proposed methods. Results on mainstream CSLR datasets (i.e., PHOENIX14, PHOENIX14-T, CSL-Daily) demonstrate that DESign achieves state-of-the-art performance.


KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

Huang, Haiduo, Zhang, Yadong, Ren, Pengju

arXiv.org Artificial Intelligence

Dynamic convolution enhances model capacity by adaptively combining multiple kernels, yet faces critical trade-offs: prior works either (1) incur significant parameter overhead by scaling kernel numbers linearly, (2) compromise inference speed through complex kernel interactions, or (3) struggle to jointly optimize dynamic attention and static kernels. We also observe that pre-trained Convolutional Neural Networks (CNNs) exhibit inter-layer redundancy akin to that in Large Language Models (LLMs). Specifically, dense convolutional layers can be efficiently replaced by derived ``child" layers generated from a shared ``parent" convolutional kernel through an adapter. To address these limitations and implement the weight-sharing mechanism, we propose a lightweight convolution kernel plug-in, named KernelDNA. It decouples kernel adaptation into input-dependent dynamic routing and pre-trained static modulation, ensuring both parameter efficiency and hardware-friendly inference. Unlike existing dynamic convolutions that expand parameters via multi-kernel ensembles, our method leverages cross-layer weight sharing and adapter-based modulation, enabling dynamic kernel specialization without altering the standard convolution structure. This design preserves the native computational efficiency of standard convolutions while enhancing representation power through input-adaptive kernel adjustments. Experiments on image classification and dense prediction tasks demonstrate that KernelDNA achieves state-of-the-art accuracy-efficiency balance among dynamic convolution variants. Our codes are available at https://github.com/haiduo/KernelDNA.


Frequency Dynamic Convolution for Dense Image Prediction

Chen, Linwei, Gu, Lin, Li, Liang, Yan, Chenggang, Fu, Ying

arXiv.org Artificial Intelligence

While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.


KernelWarehouse: Rethinking the Design of Dynamic Convolution

Li, Chao, Yao, Anbang

arXiv.org Artificial Intelligence

Dynamic convolution learns a linear mixture of n static kernels weighted with their input-dependent attentions, demonstrating superior performance than normal convolution. However, it increases the number of convolutional parameters by n times, and thus is not parameter efficient. This leads to no research progress that can allow researchers to explore the setting n>100 (an order of magnitude larger than the typical setting n<10) for pushing forward the performance boundary of dynamic convolution while enjoying parameter efficiency. To fill this gap, in this paper, we propose KernelWarehouse, a more general form of dynamic convolution, which redefines the basic concepts of ``kernels", ``assembling kernels" and ``attention function" through the lens of exploiting convolutional parameter dependencies within the same layer and across neighboring layers of a ConvNet. We testify the effectiveness of KernelWarehouse on ImageNet and MS-COCO datasets using various ConvNet architectures. Intriguingly, KernelWarehouse is also applicable to Vision Transformers, and it can even reduce the model size of a backbone while improving the model accuracy. For instance, KernelWarehouse (n=4) achieves 5.61%|3.90%|4.38% absolute top-1 accuracy gain on the ResNet18|MobileNetV2|DeiT-Tiny backbone, and KernelWarehouse (n=1/4) with 65.10% model size reduction still achieves 2.29% gain on the ResNet18 backbone. The code and models are available at https://github.com/OSVAI/KernelWarehouse.


MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

King, Tobias, Zhou, Yexu, Röddiger, Tobias, Beigl, Michael

arXiv.org Artificial Intelligence

Designing domain specific neural networks is a time-consuming, error-prone, and expensive task. Neural Architecture Search (NAS) exists to simplify domain-specific model development but there is a gap in the literature for time series classification on microcontrollers. Therefore, we adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs). We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Latency Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs. The resulting system is hardware-aware and can generate neural network architectures that satisfy user-defined limits on the execution latency and peak memory consumption. Our extensive studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models. We also show that our approach is superior in adhering to memory and latency constraints compared to domain-independent NAS baselines such as DARTS.


Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

arXiv.org Artificial Intelligence

The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.


KernelWarehouse: Towards Parameter-Efficient Dynamic Convolution

Li, Chao, Yao, Anbang

arXiv.org Artificial Intelligence

Dynamic convolution learns a linear mixture of n static kernels weighted with their sample-dependent attentions, demonstrating superior performance compared to normal convolution. However, existing designs are parameter-inefficient: they increase the number of convolutional parameters by n times. This and the optimization difficulty lead to no research progress in dynamic convolution that can allow us to use a significant large value of n (e.g., n > 100 instead of typical setting n < 10) to push forward the performance boundary. In this paper, we propose KernelWarehouse, a more general form of dynamic convolution, which can strike a favorable trade-off between parameter efficiency and representation power. Its key idea is to redefine the basic concepts of "kernels" and "assembling kernels" in dynamic convolution from the perspective of reducing kernel dimension and increasing kernel number significantly. In principle, KernelWarehouse enhances convolutional parameter dependencies within the same layer and across successive layers via tactful kernel partition and warehouse sharing. Specifically, KernelWarehouse sequentially divides a static kernel at any convolutional layer of a ConvNet into m disjoint kernel cells having the same dimensions first, and then computes each kernel cell as a linear mixture based on a predefined "warehouse" consisting of n kernel cells (e.g., n = 108) which is also shared to multiple neighboring convolutional layers, and finally replaces the static kernel by assembling its corresponding m mixtures in order, yielding a high degree of freedom to fit a desired parameter budget. To facilitate the learning of the attentions for summing up kernel cells, we also present a new attention function.


PAD-Net: An Efficient Framework for Dynamic Networks

He, Shwai, Ding, Liang, Dong, Daize, Liu, Boan, Yu, Fuqiang, Tao, Dacheng

arXiv.org Artificial Intelligence

Dynamic networks, e.g., Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE), have been extensively explored as they can considerably improve the model's representation power with acceptable computational cost. The common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones where all parameters are dynamic (at least within a single layer) and vary with the input. However, such a fully dynamic setting may cause redundant parameters and high deployment costs, limiting the applicability of dynamic networks to a broader range of tasks and models. The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Also, we further design Iterative Mode Partition to partition dynamic and static parameters efficiently. Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures, i.e., DY-Conv and MoE, on both image classification and GLUE benchmarks. Encouragingly, we surpass the fully dynamic networks by $+0.7\%$ top-1 acc with only $30\%$ dynamic parameters for ResNet-50 and $+1.9\%$ average score in language understanding with only $50\%$ dynamic parameters for BERT. Code will be released at: \url{https://github.com/Shwai-He/PAD-Net}.